Robust Hyperlinks Cost Just Five Words Each
نویسنده
چکیده
We propose robust hyperlinks as a solution to the problem of broken hyperlinks. A robust hyperlink is a URL augmented with a small "signature", computed from the referenced document. The signature can be submitted as a query to web search engines to locate the document. It turns out that very small signatures are sufficient to readily locate individual documents out of the many millions on the web. Robust hyperlinks exhibit a number of desirable qualities: They can be computed and exploited automatically, are small and cheap to compute (so that it is practical to make all hyperlinks robust), do not require new server or infrastructure support, can be rolled out reasonably well in the existing URL syntax, can be used to automatically retrofit existing links to make them robust, and are easy to understand. In particular, one can start using robust hyperlinks now, as servers and web pages are mostly compatible as is, while clients can increase their support in the future. Robust hyperlinks are one example of using the web to bootstrap new features onto itself.
منابع مشابه
Combining Mention Context and Hyperlinks from Wikipedia for Named Entity Disambiguation
Named entity disambiguation is the task of linking entity mentions to their intended referent, as represented in a Knowledge Base, usually derived from Wikipedia. In this paper, we combine local mention context and global hyperlink structure from Wikipedia in a probabilistic framework. Our results show that the two models of context, namely, words in the context and hyperlink pathways to other ...
متن کاملA Robust Scenario Based Approach in an Uncertain Condition Applied to Location-Allocation Distribution Centers Problem
The paper discusses the location-allocation model for logistic networks and distribution centers through considering uncertain parameters. In real-world cases, demands and transshipment costs change over the period of the time. This may lead to large cost deviation in total cost. Scenario based robust optimization approaches are proposed where occurrence probability of each scenario is not know...
متن کاملAutomatically Labeling Web Pages Based on Normal User Actions
For agents attempting to learn a user’s interests, the cost of obtaining labeled training instances is prohibitive because the user must directly label each training instance, and few users are willing to do so. We present an approach that circumvents the need for human-labeled pages. Instead, we learn ‘surrogate’ tasks where the desired output is easily measured, such as the number of hyperlin...
متن کاملFinding Centuries-Old Hyperlinks: a Novel Semi-Supervised Shape Classifier
Hyperlinks are so useful for searching and browsing modern digital collections that researchers have longer wondered if it is possible to retroactively add hyperlinks to digitized historical documents. There has already been significant research into this endeavor for historical text; however, in this work we consider the problem of adding hyperlinks among graphic elements. While such a system ...
متن کاملExtracting Related Words from Anchor Text Clusters by Focusing on the Page Designer's Intention
Approaches for extracting related words (terms) by co-occurrence work poorly sometimes. Two words frequently co-occurring in the same documents are considered related. However, they may not relate at all because they would have no common meanings nor similar semantics. We address this problem by considering the page designer’s intention and propose a new model to extract related words. Our appr...
متن کامل